74 research outputs found

    Dense vs. Sparse representations for news stream clustering

    Get PDF
    The abundance of news being generated on a daily basis has made it hard, if not impossible, to monitor all news developments. Thus, there is an increasing need for accurate tools that can organize the news for easier exploration. Typically, this means clustering the news stream, and then connecting the clusters into story lines. Here, we focus on the clustering step, using a local topic graph and a community detection algorithm. Traditionally, news clustering was done using sparse vector representations with TF\u2013IDF weighting, but more recently dense representations have emerged as a popular alternative. Here, we compare these two representations, as well as combinations thereof. The evaluation results on a standard dataset show a sizeable improvement over the state of the art both for the standard F1 as well as for a BCubed version thereof, which we argue is more suitable for the task

    Extending local features with contextual information in graph kernels

    Full text link
    Graph kernels are usually defined in terms of simpler kernels over local substructures of the original graphs. Different kernels consider different types of substructures. However, in some cases they have similar predictive performances, probably because the substructures can be interpreted as approximations of the subgraphs they induce. In this paper, we propose to associate to each feature a piece of information about the context in which the feature appears in the graph. A substructure appearing in two different graphs will match only if it appears with the same context in both graphs. We propose a kernel based on this idea that considers trees as substructures, and where the contexts are features too. The kernel is inspired from the framework in [6], even if it is not part of it. We give an efficient algorithm for computing the kernel and show promising results on real-world graph classification datasets.Comment: To appear in ICONIP 201

    Qlusty: Quick and dirty generation of event videos from written media coverage

    Get PDF
    Qlusty generates videos describing the coverage of the same event by different news outlets automatically. Throughout four modules it identifies events, de-duplicates notes, ranks according to coverage, and queries for images to generate an overview video. In this manuscript we present our preliminary models, including quantitative evaluations of the former two and a qualitative analysis of the latter two. The results show the potential for achieving our main aim: contributing in breaking the information bubble, so common in the current news landscape

    Prta: A System to Support the Analysis of Propaganda Techniques in the News

    Get PDF
    Recent events, such as the 2016 US Presidential Campaign, Brexit and the COVID-19 "infodemic", have brought into the spotlight the dangers of online disinformation. There has been a lot of research focusing on fact-checking and disinformation detection. However, little attention has been paid to the specific rhetorical and psychological techniques used to convey propaganda messages. Revealing the use of such techniques can help promote media literacy and critical thinking, and eventually contribute to limiting the impact of "fake news" and disinformation campaigns.Prta (Propaganda Persuasion Techniques Analyzer) allows users to explore the articles crawled on a regular basis by highlighting the spans in which propaganda techniques occur and to compare them on the basis of their use of propaganda techniques. The system further reports statistics about the use of such techniques, overall and over time, or according to filtering criteria specified by the user based on time interval, keywords, and/or political orientation of the media. Moreover, it allows users to analyze any text or URL through a dedicated interface or via an API. The system is available online: https://www.tanbih.org/prta

    Thread-level information for comment classification in community question answering

    Get PDF
    Community Question Answering (cQA) is a new application of QA in social contexts (e.g., fora). It presents new interesting challenges and research directions, e.g., exploiting the dependencies between the different comments of a thread to select the best answer for a given question. In this paper, we explored two ways of modeling such dependencies: (i) by designing specific features looking globally at the thread; and (ii) by applying structure prediction models. We trained and evaluated our models on data from SemEval-2015 Task 3 on Answer Selection in cQA. Our experiments show that: (i) the thread-level features consistently improve the performance for a variety of machine learning models, yielding state-of-the-art results; and (ii) sequential dependencies between the answer labels captured by structured prediction models are not enough to improve the results, indicating that more information is needed in the joint model

    Overview of the CLEF-2018 CheckThat! Lab on Automatic Identification and Verification of Political Claims. Task 1: Check-Worthiness

    Get PDF
    We present an overview of the CLEF-2018 CheckThat! Lab on Automatic Identification and Verification of Political Claims, with focus on Task 1: Check-Worthiness. The task asks to predict which claims in a political debate should be prioritized for fact-checking. In particular, given a debate or a political speech, the goal was to produce a ranked list of its sentences based on their worthiness for fact checking. We offered the task in both English and Arabic, based on debates from the 2016 US Presidential Campaign, as well as on some speeches during and after the campaign. A total of 30 teams registered to participate in the Lab and seven teams actually submitted systems for Task 1. The most successful approaches used by the participants relied on recurrent and multi-layer neural networks, as well as on combinations of distributional representations, on matchings claims' vocabulary against lexicons, and on measures of syntactic dependency. The best systems achieved mean average precision of 0.18 and 0.15 on the English and on the Arabic test datasets, respectively. This leaves large room for further improvement, and thus we release all datasets and the scoring scripts, which should enable further research in check-worthiness estimation

    Overview of the CLEF-2018 CheckThat! Lab on Automatic Identification and Verification of Political Claims. Task 2: Factuality

    Get PDF
    We present an overview of the CLEF-2018 CheckThat! Lab on Automatic Identification and Verification of Political Claims, with focus on Task 2: Factuality. The task asked to assess whether a given check-worthy claim made by a politician in the context of a debate/speech is factually true, half-true, or false. In terms of data, we focused on debates from the 2016 US Presidential Campaign, as well as on some speeches during and after the campaign (we also provided translations in Arabic), and we relied on comments and factuality judgments from factcheck.org and snopes.com, which we further refined manually. A total of 30 teams registered to participate in the lab, and five of them actually submitted runs. The most successful approaches used by the participants relied on the automatic retrieval of evidence from the Web. Similarities and other relationships between the claim and the retrieved documents were used as input to classifiers in order to make a decision. The best-performing official submissions achieved mean absolute error of .705 and .658 for the English and for the Arabic test sets, respectively. This leaves plenty of room for further improvement, and thus we release all datasets and the scoring scripts, which should enable further research in fact-checking

    Overview of the CLEF-2018 checkthat! lab on automatic identification and verification of political claims

    Get PDF
    We present an overview of the CLEF-2018 CheckThat! Lab on Automatic Identification and Verification of Political Claims. In its starting year, the lab featured two tasks. Task 1 asked to predict which (potential) claims in a political debate should be prioritized for fact-checking; in particular, given a debate or a political speech, the goal was to produce a ranked list of its sentences based on their worthiness for fact-checking. Task 2 asked to assess whether a given check-worthy claim made by a politician in the context of a debate/speech is factually true, half-true, or false. We offered both tasks in English and in Arabic. In terms of data, for both tasks, we focused on debates from the 2016 US Presidential Campaign, as well as on some speeches during and after the campaign (we also provided translations in Arabic), and we relied on comments and factuality judgments from factcheck.org and snopes.com, which we further refined manually. A total of 30 teams registered to participate in the lab, and 9 of them actually submitted runs. The evaluation results show that the most successful approaches used various neural networks (esp. for Task 1) and evidence retrieval from the Web (esp. for Task 2). We release all datasets, the evaluation scripts, and the submissions by the participants, which should enable further research in both check-worthiness estimation and automatic claim verification

    Properties of Graphene: A Theoretical Perspective

    Full text link
    In this review, we provide an in-depth description of the physics of monolayer and bilayer graphene from a theorist's perspective. We discuss the physical properties of graphene in an external magnetic field, reflecting the chiral nature of the quasiparticles near the Dirac point with a Landau level at zero energy. We address the unique integer quantum Hall effects, the role of electron correlations, and the recent observation of the fractional quantum Hall effect in the monolayer graphene. The quantum Hall effect in bilayer graphene is fundamentally different from that of a monolayer, reflecting the unique band structure of this system. The theory of transport in the absence of an external magnetic field is discussed in detail, along with the role of disorder studied in various theoretical models. We highlight the differences and similarities between monolayer and bilayer graphene, and focus on thermodynamic properties such as the compressibility, the plasmon spectra, the weak localization correction, quantum Hall effect, and optical properties. Confinement of electrons in graphene is nontrivial due to Klein tunneling. We review various theoretical and experimental studies of quantum confined structures made from graphene. The band structure of graphene nanoribbons and the role of the sublattice symmetry, edge geometry and the size of the nanoribbon on the electronic and magnetic properties are very active areas of research, and a detailed review of these topics is presented. Also, the effects of substrate interactions, adsorbed atoms, lattice defects and doping on the band structure of finite-sized graphene systems are discussed. We also include a brief description of graphane -- gapped material obtained from graphene by attaching hydrogen atoms to each carbon atom in the lattice.Comment: 189 pages. submitted in Advances in Physic

    Clinical features and outcomes of elderly hospitalised patients with chronic obstructive pulmonary disease, heart failure or both

    Get PDF
    Background and objective: Chronic obstructive pulmonary disease (COPD) and heart failure (HF) mutually increase the risk of being present in the same patient, especially if older. Whether or not this coexistence may be associated with a worse prognosis is debated. Therefore, employing data derived from the REPOSI register, we evaluated the clinical features and outcomes in a population of elderly patients admitted to internal medicine wards and having COPD, HF or COPD + HF. Methods: We measured socio-demographic and anthropometric characteristics, severity and prevalence of comorbidities, clinical and laboratory features during hospitalization, mood disorders, functional independence, drug prescriptions and discharge destination. The primary study outcome was the risk of death. Results: We considered 2,343 elderly hospitalized patients (median age 81 years), of whom 1,154 (49%) had COPD, 813 (35%) HF, and 376 (16%) COPD + HF. Patients with COPD + HF had different characteristics than those with COPD or HF, such as a higher prevalence of previous hospitalizations, comorbidities (especially chronic kidney disease), higher respiratory rate at admission and number of prescribed drugs. Patients with COPD + HF (hazard ratio HR 1.74, 95% confidence intervals CI 1.16-2.61) and patients with dementia (HR 1.75, 95% CI 1.06-2.90) had a higher risk of death at one year. The Kaplan-Meier curves showed a higher mortality risk in the group of patients with COPD + HF for all causes (p = 0.010), respiratory causes (p = 0.006), cardiovascular causes (p = 0.046) and respiratory plus cardiovascular causes (p = 0.009). Conclusion: In this real-life cohort of hospitalized elderly patients, the coexistence of COPD and HF significantly worsened prognosis at one year. This finding may help to better define the care needs of this population
    • …
    corecore